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0.  Introduction 

This  is  the  final  report  for  the  AFOSR  Grant  “Distributed  control  for  networked  systems  with  non- 
traditional  communication  constraints:  Lossy  links,  power  and  usage  limitations,  and  induced  cooperation,” 
covering  the  period  March  1,  2009  -  November  30,  2011.  The  report  is  comprised  of  5  sections.  Section  1 
provides  an  overview  of  the  relevant  research  area  and  the  issues  addressed  by  the  PI  during  the  period  of  the 
Grant.  Section  2  describes  the  research  conducted  and  the  results  obtained,  broken  down  to  7  subsections, 
each  one  covering  a  specific  subarea.  After  a  brief  conclusions  section  (Section  3),  the  report  contains  a  list 
of  plenary  talks  and  special  colloquia  given  by  the  PI  on  topics  covered  by  the  Grant  (Section  4),  and  a  list 
of  publications  and  conference  presentations  resulting  from  this  research,  along  with  additional  bibliography 
cited  in  the  report  (Section  5). 

1.  Overview  of  the  Research  Area  and  Issues  Addressed 

As  wireless  sensing  and  control  become  increasingly  applicable  in  fields  ranging  from  real-time  alarm  systems 
and  vehicle  systems  to  aeronautical  guidance  and  formation  control,  the  need  for  establishing  a  theoretical 
foundation  for  what  is  known  as  networked  systems  [Rl]  has  grown  likewise.  Such  systems  have  sensors  and 
controllers  distributed  generally  in  an  ad-hoc  manner,  but  have  to  be  connected  virtually  either  through 
communication  and  information  transmission  or  because  of  the  need  to  achieve  some  level  of  performance 
driven  by  individual  or  common  goals,  or  both.  Any  effective  effort  to  develop  a  theoretical  foundation  for 
this  relatively  new  paradigm  necessitates  pooling  together  of  tools  (both  conceptual  and  algorithmic)  from 
multiple  seemingly  disparate  disciplines,  such  as  control  theory,  information  theory,  coding,  communication, 
computing,  and  game  theory.  Some  salient  aspects  of  this  paradigm,  and  the  challenging  issues  that  arise  in 
this  context,  which  we  have  addressed  in  the  research  supported  by  this  AFOSR  grant,  are  as  follows,  where 
the  generic  term  agents  is  used  for  entities  that  are  responsible  for  decision  making  that  leads  to  actions,  be 
they  sensors  or  controllers  or  even  dynamical  systems. 

•  Incomplete  information.  In  most  networked  systems,  the  agents  do  not  have  access  to  the  complete 
map  of  the  network  on  which  they  operate,  and  know  only  the  presence  of  their  neighbors  with  whom  they 
interact.  Incompleteness  of  information  (which  is  different  from  receiving  noisy  information  or  measure¬ 
ments)  necessitates  that  each  agent  develop  a  subjective  probabilistic  model  of  the  network  and  update  its 
model  (through  learning)  as  more  information  becomes  available.  One  of  the  questions  that  arise  in  this 
connection  is  whether  such  iterations  converge  and  if  they  do  whether  they  converge  to  the  same  limiting 
model  {consensus) .  Another  question  is  the  dependence  of  the  various  issues  listed  below,  and  particularly 
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the  sensitivity  of  the  performance  attainable  for  the  network,  on  inaccuracies  in  the  modeling  by  different 
agents.  In  other  words,  what  is  the  cost  of  incompleteness  of  information ? 

•  Decentralization.  Even  if  all  agents  share  the  same  (probabilistic)  model  of  the  network  on  which 
they  operate,  the  on-line  measurements  they  make  and  the  information  they  receive  are  not  broadcast  and 
hence  are  not  centrally  available.  Decentralized  optimal  decision  making  and  control  problems,  particularly 
in  stochastic  environments,  are  known  to  be  notoriously  difficult  (not  only  to  solve  numerically,  but  even  con¬ 
ceptually)  because  of  the  tradeoff  between  two  generally  conflicting  roles  of  a  control/decision  policy  in  such 
systems  with  nonclassical  information:  contributing  toward  performance  improvement  at  the  present  versus 
carrying  useful  information  into  future  stages  which  could  potentially  lead  to  better  overall  performance 
[R2].  This  information- carrying  capability  of  control  laws  lies  at  the  heart  of  the  difficulty  in  constructing 
optimal  policies  when  different  agents  operating  on  the  same  network  do  not  share  the  same  information  or 
the  same  measurement  channels.  One  may  argue  that  the  current  sensor  technology  partially  alleviates  this 
problem  by  making  it  possible  for  sensors  linked  to  individual  control  channels  or  agents  to  communicate 
with  each  other  and  transfer  the  necessary  information  so  that  all  needed  data  are  eventually  shared  by  all 
relevant  parties.  However,  what  emerges  is  still  not  a  centralized  control  problem  because  of  the  power  and 
bandwidth  limitations  inherent  to  sensor  communication  and  the  presence  of  time  delay(s)  in  the  sharing  of 
information  in  a  distributed  system.  There  have  been  some  success  stories  in  using  some  indirect  methods, 
particularly  information  theory  based  bounds,  in  solving  optimal  decision  problems  with  nonclassical  infor¬ 
mation  [R3],  but  the  field  is  still  in  its  infancy.  Even  though  to  obtain  a  general  theory  for  optimal  control 
problems  with  general  nonclassical  information  could  be  out  of  reach,  still  for  some  special  structures  the 
problem  may  be  tractable,  as  in  [R3]  and  [R2],  using  however  some  nontraditional  tools.  Carving  out  such 
special  structures  rooted  in  real  applications,  and  generating  new  approaches  for  these  problems  has  been 
one  of  the  goals  of  our  research. 

•  Bandwidth  limitations.  Most  networked  systems  have  digital  (wired  or  wireless)  links  between  the 
plant,  the  controllers,  and  the  sensors,  which  are  necessarily  bandwidth  limited.  To  match  the  input  to  a 
channel’s  characteristics  one  has  to  pass  the  signal  through  a  sampler  (if  it  is  continuous  time)  followed  by  a 
quantizer  and  an  encoder.  At  the  output  of  the  channel  the  message  will  have  to  be  decoded,  de-quantized, 
and  passed  through  a  sample- hold  to  lift  it  back  to  a  continuos-time  and/or  continuous-alphabet  signal  or 
message.  The  questions  that  arise  here  at  the  front  end  are  how  to  sample  (time  invariant  or  time  varying) 
and  at  what  rate  (variable  or  constant);  how  to  quantize  (uniform,  logarithmic,  etc)  under  given  channel 
capacity  constraints;  and  how  to  encode  (time  invariant,  time  varying,  memoryless,  with  memory,  etc),  so 
that  certain  performance  guarantees  (including  stability)  can  be  assured.  The  presence  of  the  feedback  loop 
and  the  natural  requirements  of  real-time  implement  ability  and  causality  on  the  designs  make  standard 
coding  theory  techniques  inapplicable  in  these  scenarios.  Even  though  we  have  not  addressed  these  problems 
in  our  research  under  this  grant,  it  is  worth  noting  that  some  strides  have  been  made  here  by  many  (including 
us;  [R4]-[R6])  during  the  last  decade,  which  have  however  only  scratched  the  surface,  leaving  still  several  open 
problems,  such  as  when  channels  are  noisy  [R7]  (not  quantization  noise)  or  when  controls  are  distributed 
and  information  is  decentralized  [R8]-[R10]  in  which  case  signaling  through  control  plays  an  important  role 
in  enhancing  information  flow  in  the  network. 

•  Usage  limitations  and  sharing  of  resources.  Another  type  of  a  communication  constraint  arises 
when  there  are  restrictions  on  channel  usage  and/or  on  the  frequency  of  interactions  between  neighboring 
agents.  The  former  arises  when,  for  example,  communication  or  information  transmission  is  carried  out  using 
wireless  devices  which  are  naturally  power  limited  due  to  battery  lifespan.  This  requirement  imposes  severe 
restrictions  on  the  duration  of  time  the  wireless  device  can  be  on/awake  and  the  number  of  transmissions  it 
can  make.  This  is  because  the  radio  frequency  (RF)  communication  consumes  a  significant  portion  of  the 
battery  power  when  the  wireless  unit  is  awake.  Therefore,  life  of  the  wireless  device  can  be  lengthened  by 
optimizing  the  duty  cycle  (or  reporting  frequency)  of  the  unit  as  well  as  by  transmitting  data  only  when  it 
is  necessary.  This  type  of  limitation  brings  in  a  constraint  on  the  decision  making  function  of  sensors  and 
controllers,  which  is  not  of  the  standard  type.  Some  of  the  questions  that  arise  in  this  context  are  [Rll]- 
[R14]:  How  does  one  optimally  schedule  transmission  of  data/information  from  one  agent  (such  as  a  sensor) 
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to  another  (say  a  controller)  on  a  network  under  a  constraint  on  the  frequency  of  transmissions?  How  can 
one  quantify  the  degradation  in  performance  due  to  such  constraints  on  the  number  of  interactions  between 
different  decision  units?  If  a  communication  link  is  used  by  multiple  sensors  and  multiple  controllers,  how 
should  transmissions  be  scheduled  (or  bandwidth  be  time-shared)  so  that  optimum  performance  is  achieved? 
A  related  question  is  [R15]:  If  both  sensing  and  control  are  to  be  executed  using  the  same  bandwidth- 
limited  channel,  how  should  the  resource  be  allocated  between  these  two  uses?  Should  a  higher  fraction  (of 
the  resource)  be  allocated  to  control/action  signal  transmission  or  to  receiving  more  data  to  improve  the 
performance  of  less  frequent  control  actions?  Toward  the  beginning  of  the  grant  period,  we  had  initiated 
research  on  these  problems  with  non-traditional  constraints,  and  since  then  we  have  made  significant  inroads 
into  the  underlying  challenges,  such  as  obtaining  recursively  computable  threshold-type  optimal  policies  for 
some  specific  models.  Details  are  given  in  the  next  section. 

•  Lossy,  leaky,  and  unreliable  links,  and  adversarial  interference.  In  a  network,  link  failures 
cause  the  information  flow  between  the  controller  and  the  plant  (or  in  general  between  two  or  more  agents)  to 
be  disrupted,  which  results  in  control  and/or  measurement  packets  being  lost.  This  disruption  of  communi¬ 
cation  has  a  deteriorating  effect  on  the  networked  control  system  performance.  Therefore,  it  is  important  to 
develop  an  understanding  of  how  much  loss  the  control  system,  or  the  network  in  general  can  tolerate  before 
the  system  becomes  unstable,  or  in  the  case  of  estimation  before  the  estimation  error  becomes  unbounded, 
or  in  the  case  of  a  general  network  before  overall  performance  falls  below  an  unacceptable  level.  Further,  if 
the  statistical  description  of  the  link  failure  process  is  given  a  priori,  a  problem  of  interest  is  to  determine 
the  optimal  control  and  estimation  policies  under  the  link  failure  constraints.  In  a  remote  control  setting, 
packet  losses  may  occur  both  from  the  sensor  to  the  controller,  and  from  the  controller  to  the  actuator.  In 
the  first  case,  the  measurement  packets  are  lost  and  therefore  the  controller  has  access  to  perfect  or  imperfect 
information  on  the  state  only  intermittently.  In  the  latter  case,  the  control  or  actuation  packets  are  lost, 
and  this  causes  the  actuator  to  have  access  to  the  controls  intermittently.  Optimal  control  of  stochastic 
systems  in  a  networking  environment  under  such  unreliability  constraints  is  a  problem  of  great  relevance  and 
importance  to  networked  systems,  and  we  had  initiated  research  on  this  class  of  problems  just  prior  to  the 
start  of  the  Grant,  and  we  have  expanded  on  this  line  of  enquiry  during  the  course  of  this  Grant,  as  discussed 
in  the  next  section.  We  note  that  unreliability  could  be  the  result  of  some  malfunction  or  fading,  which 
admits  a  statistical  description  [R16],  [R17],  or  could  be  the  result  of  some  adversarial  malicious  attack,  as 
in  [R18],  or  both  [R19].  Further,  one  has  to  distinguish  between  the  case  where  the  sender  agent  (such  as  the 
controller)  receives  an  acknowledgment  for  each  control  packet  it  sends  to  the  receiving  agent  (such  as  the 
actuator),  and  the  case  where  he  does  not.  The  former  resembles  the  Internet,  with  the  acknowledgement 
mechanism  built  into  TCP  (Transmission  Control  Protocol),  whereas  the  latter  resembles  a  best-effort,  or 
UDP  (User  Datagram  Protocol)  type  network.  In  our  fairly  recent  work  prior  to  the  start  of  this  Grant, 
we  had  shown  that  one  can  formulate  reasonable  and  tractable  stochastic  optimization  formulations  which 
capture  the  salient  aspects  underlying  these  problems,  and  this  had  opened  up  promising  avenues  for  research 
toward  developing  a  comprehensive  theory  for  such  systems.  One  of  these,  which  we  have  initiated  under 
this  Grant,  was  looking  at  stochastic  control  problems  under  usage  limitations  as  in  the  earlier  bullet,  where 
now  the  communication  links  connecting  different  agents  are  not  reliable,  with  unreliability  given  a  precise 
mathematical  description  in  terms  of  a  Bernoulli  process.  Another  one  on  which  we  have  obtained  substantial 
new  results  during  the  Grant  period  is  the  class  of  problems  where  there  is  adversarial  interference  on  the 
interactions  of  different  agents,  which  could  be  in  the  form  of  jamming  the  communication  links,  controlling 
link  failure  rates,  and  limiting  usage  of  the  channels.  These  are  further  discussed  in  the  next  section. 

•  Distributed  local-performance  driven  decision  making.  In  a  networked  system,  or  equivalently 
in  a  network  of  distributed  agents,  even  though  there  may  be  a  central  goal,  it  is  unrealistic  to  assume  that 
this  overall  goal  is  communicated  to  and  even  shared  by  all  individual  agents.  Instead,  agents  will  have  their 
individual  local  performance- driven  objectives,  and  will  carry  out  their  actions  computed  in  accordance  with 
these  objectives  and  under  the  informational  and  communication  constraints  of  the  types  introduced  and 
discussed  above.  Further,  they  will  be  acting  non-cooperatively  (as  in  Nash  equilibrium),  even  though  the 
overall  mission  will  necessitate  team  coordination.  A  question  of  interest  here  is  whether,  in  the  absence  of  a 
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higher-level  coordinator,  the  interactions  of  the  agents  under  the  given  informational  constraints  and  driven 
by  their  local  objectives,  will  converge  to  a  Nash  equilibrium,  and  how  far  this  equilibrium  would  be  from  an 
efficient  solution  which  requires  collaboration  at  some  cost  of  additional  communication.  That  is,  what  is 
the  price  of  non-cooperative  behavior  that  is  due  to  localization  of  objectives  ?  Another  question  is:  If  there 
is  a  common  underlying  goal  or  mission  (such  as  detection  of  a  target),  is  consensus  reached  by  the  agents, 
as  in  [R20]?  These  questions  have  been  addressed  during  the  course  of  this  Grant  within  the  context  of  the 
various  scenarios  introduced  above  (as  discussed  in  the  next  section). 

•  Coordination  through  multi-level  interaction.  The  discussion  above  has  assumed  that  all  agents 
on  a  network  are  same-level  decision  makers  or  computing  units,  and  that  there  is  no  higher  level  coordinator. 
The  presence  of  a  higher  level  coordinator  (or  one  agent  acting  as  such),  with  the  coordinator  sending 
appropriate  signals  to  lower-level  units  based  on  information  received  from  the  other  agents,  could  facilitate 
coordination.  This  could  in  fact  induce  all  agents  to  cooperate  as  though  they  were  acting  as  members  of  a 
team  working  toward  a  common  goal,  even  though  each  is  optimizing  its  own  objective  function,  as  shown 
in  the  context  of  a  network  services  problem  [R21].  But  signaling  is  costly,  and  effective  signaling  requires 
a  two-way  communication,  with  also  information  fed  from  the  agents  to  the  coordinator.  How  should  this 
information  be  properly  aggregated  and  quantized,  is  a  challenging  question  that  arises  here.  Another  related 
one  is  whether  the  signaling  can  be  restricted  to  one-  or  two-bit  messaging,  as  in  [R22],  without  significant 
degradation  in  performance.  These  and  related  issues  have  also  been  addressed  in  our  research. 

2.  Description  of  Research  Conducted  and  Selected  Results 

Forty-six  publications  and  presentations  based  on  research  supported  by  the  Grant  are  listed  in  Section 
5,  in  reverse  chronological  order.  Here  we  discuss  some  salient  aspects  of  the  results  obtained,  organized  into 
seven  subsections. 

2.1.  Decision  making  in  adversarial  environments  with  limits  on  the  frequency  of  actions 

We  have  introduced  in  the  previous  section  the  rationale  for  placing  limits  on  the  number  of  times 
different  decision  units  in  a  distributed  network  would  interact  with  their  environment,  and  particularly 
with  other  decision  units.  These  problems  can  be  formulated  as  dynamic  stochastic  optimization  problems 
(in  discrete  time),  with  a  total  usage  constraint  (over  a  finite  horizon)  on  the  number  of  actions  by  each 
decision  unit.  One  specific  problem  that  we  have  looked  at  is  that  of  transmission  of  a  discrete-time  random 
process  (a  finite  string)  over  a  channel  with  the  goal  of  generating  an  accurate  estimate  at  the  other  end  (under 
minimum  mean  squared  error  (MMSE)  or  minimum  probability  of  error  (MPE)  criteria)  [11].  If  the  sequence 
is  of  length  N,  and  the  channel  can  be  used  only  M  <  N  times,  then  the  question  is  determination  of  the 
optimum  memoryless  transmission  (encoding)  policy  and  the  corresponding  optimum  estimator  (decoder) 
policy  under  these  constraints.  We  have  shown  that  the  optimum  transmission  policy  is  to  carry  out  a 
comparison  at  each  point  in  (discrete)  time  the  observed  random  variable  in  the  sequence  against  a  time- 
varying,  pre-computable  threshold,  and  decide  whether  to  transmit  or  not  transmit  accordingly.  A  salient 
characteristic  of  this  optimal  solution  is  that  even  when  transmission  does  not  take  place,  this  still  has  some 
information  content  regarding  the  true  value  of  the  random  variable  at  that  point  in  time,  which  can  be 
used  at  the  receiving  end  to  improve  the  estimate  of  the  random  sequence.  The  optimal  solution  can  also  be 
seen  leading  to  an  event  driven  system,  with  the  events  being  associated  with  transmissions,  each  one  being 
triggered  by  the  norm  of  the  random  variable’s  realized  value  exceeding  a  given  threshold.  The  difference 
from  a  more  standard  event-driven  system  though  is  that  here  the  trigger  mechanism  is  optimally  controlled 
and  hence  is  part  of  the  overall  design  process,  rather  than  being  exogenous  to  the  system. 

We  have  subsequently  introduced  an  adversarial  component  into  the  problem,  not  in  an  estimation  but 
in  a  control  context:  in  a  standard  stochastic  networked  control  system  setting  there  is  an  additional  entity, 
strategic  adversary ,  who  has  the  capability  to  jam  the  channel  that  connects  the  controller  to  the  plant  and 
prevent  the  control  signal  from  reaching  the  plant  [32],  but  under  some  budget  constraints.  The  jammer  acts 
only  intermittently,  limited  to  a  given  finite  number  (say  M )  of  actions  over  a  horizon  of  N(N  >  M )  time 
steps.  Such  a  constraint  is  introduced  to  capture  the  fact  that,  since  jamming  is  a  power  intensive  activity 
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and  available  power  on-board  a  jammer  is  typically  limited,  continuous  action  throughout  the  entire  decision 
horizon  is  not  possible.  Such  a  formulation  leads  to  a  dynamic  zero-sum  game  between  two  players,  in  this 
case  the  controller  and  the  jammer.  We  show  in  [32]  that  for  this  game  a  saddle-point  equilibrium  exists 
under  full  state,  total  recall  information  structure  for  both  players,  and  obtain  the  corresponding  control 
and  jamming  strategies.  The  nature  of  the  solution  is  such  that  the  jammer  acts  according  to  a  threshold 
policy,  which  means  that  at  every  time  step,  the  jammer  jams  if  and  only  if  a  particular  norm  of  the  plant’s 
state  is  larger  than  an  off-line  computable  and  time-varying  threshold.  We  derive  in  [32]  various  properties 
of  the  threshold  functions,  and  complement  the  study  with  numerical  simulations. 

2.2.  Communication  jamming  and  allocation  of  resources,  with  application  in  formation  of 
UAVs 

In  a  series  of  papers  [1,  9,  14,  16,  23,  28,  29,  30,  37,  40,  42,  44],  we  have  brought  in  additional  elements 
into  the  jamming  scenario  above  such  as  mobility,  continuous  dynamics,  and  multiplicity  of  agents.  In  [1] 
and  [37],  we  have  investigated  a  jamming  attack  on  the  communication  network  of  a  team  of  unmanned 
aerial  vehicles  (UAVs)  flying  in  a  formation  under  a  communication  and  a  motion  model  for  the  UAVs, 
where  communication  is  essential  for  a  team  of  UAVs  to  sustain  formation.  We  formulated  the  problem  as  a 
zero-sum  pursuit-evasion  (P-E)  game  (not  between  two  players  as  in  standard  P-E  games,  but  between  two 
teams)  where  the  cost  function  is  the  termination  time  of  the  game,  with  termination  defined  as  breakdown 
of  communication  among  the  team  of  UAVs.  Using  the  framework  of  Isaacs,  we  have  obtained  motion 
strategies  for  the  UAVs  to  evade  the  jamming  attack,  and  have  also  provided  motion  strategies  for  aerial 
intruders  to  jam  the  communication  between  the  UAVs.  [37]  was  restricted  to  the  case  of  2  UAVs  and  a 
jammer,  and  [1]  provided  an  extension  to  multiple  UAVs  and  jammers.  And  in  [9],  we  extended  the  results 
to  a  scenarion  involving  also  AGVs.  We  have  addressed  the  connectivity  maintenance  problem  also  in  [14], 
but  now  when  two  mobile  autonomous  agents  (holonomic  agents)  navigate  in  an  environment  containing 
polygonal  obstacles.  One  of  the  agents,  the  pursuer,  is  assumed  to  follow  the  other  agent  so  as  to  maintain  a 
constant  line-of-sight,  which  is  the  path  of  the  dominant  signal.  The  other  agent  is  modeled  as  an  adversary 
that  tries  to  break  the  line-of-sight  with  the  pursuer.  Therefore,  the  problem  of  maintaining  a  healthy 
communication  link  has  been  modeled  as  a  visibility  based  pursuit-evasion  game,  where  we  have  adopted 
the  Rician  fading  model  for  the  communication  channel.  We  have  investigated  a  specific  kind  of  singular 
surface  that  appears  in  the  solution  to  the  underlying  pursuit-evasion  game,  namely  the  dispersal  surface.  In 
the  paper,  we  have  presented  construction  of  the  projection  of  several  dispersal  surfaces  for  various  obstacle 
geometries  by  fixing  the  initial  position  of  the  evader.  Further,  we  have  worked  several  numerical  simulations 
for  specific  environments  containing  obstacles. 

We  have  considered  in  [44]  again  the  problem  of  maintaining  connectivity  in  a  network  of  mobile  agents 
in  formation  in  the  presence  of  a  jammer,  but  now  from  a  control-theoretic  point  of  view.  For  the  underlying 
differential  game,  we  have  obtained  a  complete  set  of  necessary  conditions  for  optimal  controls  for  each 
agent.  One  novelty  is  the  introduction  of  a  model  that  constructs  a  state-dependent  graph  based  on  the 
state-space  of  the  agents.  We  use  tools  from  algebraic-grapli  theory  on  the  state-dependent  graph  in  order  to 
provide  locally  optimal  control  laws  for  the  agents  in  the  formation.  Simulations  validate  the  control  scheme 
introduced  in  the  paper. 

In  another  paper,  [17],  we  again  investigate,  but  from  a  different  perspective,  a  jamming  attack  on 
the  communication  network  of  a  multi-agent  system  in  a  formation  and  formulate  the  problem  as  a  zero- 
sum  pursuit-evasion  game.  In  the  models  of  [1]  and  [37],  we  had  used  Isaacs’  framework  (as  discussed 
above)  to  obtain  motion  strategies  for  a  network  of  agents  to  evade  the  jamming  attack.  In  this  work, 
however,  we  imagine  a  scenario  in  which  each  agent  has  prior  knowledge  about  the  underlying  value  function 
under  perfect  state  information.  Due  to  lack  of  information  about  all  the  agents  in  the  team,  each  agent 
is  forced  to  make  a  local  decision  based  only  on  the  information  about  his  neighbors.  We  develop  online 
algorithms  under  two  different  decentralized  information  patterns  which  converge  for  each  player  to  local 
strategies  that  use  estimators  designed  based  on  state  equations  and  the  value  function.  A  further  work 
in  this  direction  is  [30],  where  we  address  the  problem  where  each  UAV  determines  its  control  strategy 
based  on  limited  information  available  from  its  neighbors  in  the  network  graph.  The  limitations  are  posed 
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in  a  way  such  that  each  UAV  receives  the  state  information  about  other  UAVs  in  the  formation  that  are 
at  most  n  hops  away  in  the  network  graph.  Under  this  information  structure,  we  study  the  performance 
of  the  entire  formation  when  each  UAV  runs  an  estimator  based  on  the  underlying  information  pattern 
in  order  to  compute  its  actions.  The  performance  measure  considered  in  this  game  is  the  maximum  time 
for  which  the  network  remains  connected  in  the  presence  of  an  aerial  adversary.  We  again  use  tools  and 
conceptual  framework  of  differential  game  theory  to  obtain  the  saddle-point  strategies  of  the  underlying  P-E 
game  under  the  constrained  information  structure.  We  present  results  on  the  attainable  performance  when 
1  <  n  <  diam(G'),  where  diam(G')  represents  the  diameter  of  the  graph  of  the  underlying  communication 
network. 

Two  other  recent  papers  on  jamming  games  between  two  teams  are  [28]  and  [29],  where  we  study  the 
problem  of  power  allocation  and  adaptive  modulation  in  teams  of  decision  makers.  We  restrict  the  study 
initially  to  the  case  of  two  teams  with  each  team  consisting  of  two  mobile  agents.  Agents  belonging  to  the 
same  team  communicate  over  wireless  ad  hoc  networks,  and  they  try  to  split  their  available  power  between 
the  tasks  of  communication  and  jamming  the  nodes  of  the  other  team.  The  agents  have  constraints  on  their 
total  energy  and  instantaneous  power  usage.  The  cost  function  adopted  is  the  difference  between  the  rates 
of  erroneously  transmitted  bits  of  each  team.  We  model  the  adaptive  modulation  problem  as  a  zero-sum 
matrix  game  which  in  turn  gives  rise  to  a  continuous  kernel  game  to  handle  power  control.  Based  on  the 
communications  model,  we  present  sufficient  conditions  on  the  physical  parameters  of  the  agents  for  the 
existence  of  a  pure  strategy  saddle-point  equilibrium  (PSSPE).  We  present  simulation  results  for  the  case 
when  the  agents  are  holonomic. 

In  a  recent  paper  [16],  we  have  considered  a  variation  of  the  problem  above  where  now  each  player  has 
an  omnidirectional  antenna  for  jamming  the  communication  between  the  members  of  the  other  team.  Again 
we  consider  the  case  of  two  teams  with  each  team  consisting  of  two  mobile  agents.  Agents  belonging  to  the 
same  team  communicate  over  wireless  ad  hoc  networks,  and  they  try  to  split  their  available  power  between 
the  tasks  of  communication  and  jamming  the  nodes  of  the  other  team.  The  agents  again  have  constraints  on 
their  total  energy  and  instantaneous  power  usage,  and  the  cost  function  adopted  is  the  difference  between  the 
rates  of  erroneously  transmitted  bits  of  each  team.  Formulating  the  problem  as  a  zero-sum  differential  game 
between  two  teams,  we  prove  the  existence  of  a  PSSPE  and  obtain  a  characterization  of  optimal  strategies. 
What  we  observe  is  a  switching  behavior  in  the  optimal  communication  strategy  within  a  team,  over  the 
time  horizon  of  the  entire  game. 

Our  final  work  on  this  general  topic  is  [15],  where  we  study  efficient  transmission  of  information  in  a 
wireless  medium  with  stationary  nodes  and  relays,  but  again  in  an  adversarial  environment.  We  study  the 
complex  decision  making  processes  between  such  a  network  of  wireless  users  that  perform  uplink  transmission 
via  relay  stations  and  an  active  malicious  node,  that  is  able  to  act  as  an  eavesdropper  and  as  a  jammer.  We 
formulate  a  noncooperative  game  in  which  the  wireless  users  and  the  malicious  node  are  the  players.  On 
the  one  hand,  the  users  seek  to  choose  the  relay  station  that  maximizes  their  utilities  which  reflect  their 
potential  mutual  interference  as  well  as  the  security  of  the  chosen  data  transmission  path.  On  the  other 
hand,  the  objective  of  the  malicious  node  is  to  choose  whether  to  eavesdrop,  jam,  or  use  a  combination 
of  both  strategies,  in  a  way  to  reduce  the  total  capacity  at  all  the  hops  of  the  network.  To  solve  the 
game,  we  introduce  a  fictitious  play-based  algorithm  using  which  the  users  and  the  malicious  node  reach  a 
mixed-strategy  Nash  equilibrium.  Simulation  results  show  that  the  proposed  approach  improves  the  average 
expected  utility  per  user  by  up  to  49.4neighbor  algorithm.  The  results  also  show  how  the  malicious  node 
can  strategically  decide  on  whether  to  jam  or  eavesdrop  depending  on  its  capabilities  and  objectives. 

2.3.  Quantization  and  transmission  over  noisy  channels 

In  the  publication  [7],  we  have  considered  the  problem  of  remotely  controlling  a  continuous-time  lin¬ 
ear  time-invariant  system  driven  by  Brownian  motion  process,  when  communication  takes  place  over  noisy 
memoryless  discrete-  or  continuous- alphabet  channels.  What  makes  this  class  of  remote  control  problems 
different  from  all  the  previously  studied  models  is  the  presence  of  noise  in  both  the  forward  channel  (connect¬ 
ing  sensors  to  the  controller)  and  the  reverse  channel  (connecting  the  controller  to  the  plant).  For  stability 
of  the  closed-loop  system,  we  look  for  the  existence  of  an  invariant  distribution  for  the  state,  for  which 
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we  show  that  it  is  necessary  that  the  entire  control  space  and  the  state  space  be  encoded,  and  that  the 
reverse  channel  be  at  least  as  reliable  as  the  forward  channel.  We  obtain  necessary  conditions  and  sufficient 
conditions  on  the  channels  and  the  controllers  for  stabilizability.  Using  properties  of  the  underlying  sampled 
Markov  chain,  we  show  that  under  variable-length  coding  and  some  realistic  channel  conditions,  stability  can 
be  achieved  over  discrete-alphabet  channels  even  if  the  entire  state  and  control  spaces  are  to  be  encoded  and 
the  number  of  bits  that  can  be  transmitted  per  unit  time  is  strictly  bounded.  For  control  over  continuous- 
alphabet  channels,  however,  a  variable  rate  scheme  is  not  necessary.  We  also  show  that  memoryless  policies 
are  rate-efficient  for  Gaussian  channels. 

Quantization  is  also  the  centerpiece  of  research  reported  in  [5],  [18],  [25]  and  [36],  which  study  its 
effect  on  the  performance  of  C\  adaptive  controllers.  In  [5]  and  [18],  we  address  the  problem  of  tracking  for  a 
general  class  of  uncertain  nonlinear  MIMO  systems  with  input  quantization,  without  requiring  any  matching 
conditions.  We  consider  the  C\  adaptive  controller  and  analyze  its  performance  bounds  in  the  presence  of 
input  quantization  of  two  types:  uniform  and  logarithmic.  In  both  cases  we  provide  the  transient  performance 
bounds,  which  are  decoupled  into  two  positive  terms.  One  of  these  terms  can  be  made  arbitrarily  small  by 
increasing  the  rate  of  adaptation,  while  the  other  term  can  be  made  small  by  increasing  the  quantization 
density.  The  performance  bounds  imply  that  with  sufficiently  dense  quantization  and  fast  adaptation,  the 
output  of  an  uncertain  MIMO  nonlinear  system  can  follow  the  desired  reference  input  sufficiently  closely. 
We  notice  that  with  C\  adaptive  control  architecture  fast  adaptation  does  not  lead  to  high-gain  control  and 
retains  guaranteed  time-delay  margin,  which  is  bounded  away  from  zero. 

An  earlier  work  [36]  has  looked  at  the  case  of  linear  uncertain  systems  again  with  input  quantization.  We 
show  that  the  performance  bounds  of  the  C\  adaptive  controller  (in  the  presence  of  input  quantization)  have 
an  additional  term,  dependent  upon  the  quality  of  quantization.  The  signals  of  the  closed-loop  C\  adaptive 
systems  can  be  rendered  arbitrarily  close  to  the  corresponding  signals  of  a  bounded  reference  system  by 
increasing  the  adaptation  rate  and  improving  the  quantizer. 

2.4.  Inefficiency  of  Nash  equilibrium,  and  different  notions  of  price 

It  is  well  known  that  the  non-cooperative  Nash  equilibrium  in  nonzero-sum  games  is  generally  inefficient, 
which  means  that  it  would  be  possible  for  all  players  to  do  better  in  terms  of  attaining  higher  utilities  or  lower 
costs  (than  they  would  attain  under  Nash  equilibria,  even  if  the  equilibrium  is  unique)  through  cooperation. 
This  is  true  for  static  deterministic  games,  and  naturally  also  for  stochastic  games  as  well  as  dynamic  and 
differential  games.  In  these  latter  of  classes  of  games,  one  could  bring  up  additional  issues  with  regard  to 
Nash  equilibria  beyond  efficiency  or  lack  thereof,  such  as  whether  an  increase  in  information  to  one  player 
(or  all  or  a  subset  of  the  players)  would  be  advantageous  to  that  player  (or  groups  of  players),  in  terms 
of  attaining  higher  utilities  or  lower  costs,  or  whether  acquiring  more  information  would  be  undesirable  for 
a  player.  In  the  special  class  of  games  where  all  players  have  the  same  utility  function  or  cost  function 
(that  is,  team  problems)  and  what  is  sought  is  the  global  maximum  or  global  minimum  of  these  functions, 
the  answer  to  such  a  query  is  clean,  which  is  that  additional  information  (defined  as  expansion  of  sigma 
fields)  can  never  hurt.  The  same  is  true  for  the  special  class  of  zero-sum  games.  In  stochastic  games,  or 
dynamic  and  differential  games  which  are  not  team  problems  or  zero-sum  games,  however,  the  answer  is  not 
that  clean,  and  one  could  encounter  quite  surprising  and  at  the  outset  counter-intuitive  results.  Perhaps 
the  first  demonstration  of  this  was  reported  by  the  PI  some  40  years  ago,  when  he  studied  two  classes 
of  two-player  stochastic  static  games,  one  a  linear-quadratic-Gaussian  (LQG)  model  and  the  other  one  a 
stochastic  Cournot  duopoly  model,  both  of  which  admit  unique  Nash  equilibria.  It  was  shown  that  for  the 
LQG  model  better  information  (on  some  stochastic  variables)  for  only  one  player  leads  to  lower  average  Nash 
equilibrium  costs  for  both  players,  but  in  the  duopoly  model  only  the  player  whose  information  is  improved 
benefits  while  the  other  one  hurts  (in  the  sense  that  his  average  Nash  equilibrium  cost  increases).  Another 
way  of  comparison  would  be  in  terms  of  the  relative  values  of  the  average  Nash  equilibrium  costs  attained 
by  the  players,  when  one  player  has  informational  advantage  over  the  other.  It  was  again  shown  by  the  PI 
that,  in  an  otherwise  completely  symmetric  game,  the  player  who  has  better  information  attains  higher  cost 
than  the  other  player  in  the  LQG  model  (the  counter-intuitive  result),  whereas  he  attains  lower  cost  in  the 
duopoly  model  (the  intuitive  result).  Several  manifestations  of  these  conclusions  can  be  seen  also  in  dynamic 
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and  differential  games;  for  example  time-consistent  open-loop  Nash  equilibrium  is  not  necessarily  inferior  to 
the  strongly  time-consistent  closed-loop  feedback  Nash  equilibrium. 

Now  coming  back  to  inefficiency  of  Nash  equilibrium  in  a  fixed  nonzero-sunr  game,  one  question  of 
interest  is  exploration  of  the  extent  of  this  inefficiency,  that  is  how  far  off  is  a  Nash  equilibrium  from  the 
socially  optimal  solution,  which  is  obtained  as  the  maximum  of  the  sum  of  the  utilities  of  the  players,  or 
some  convex  combination  of  the  utilities  (or  minimum  in  the  case  of  cost  functions).  The  notion  of  the  price 
of  anarchy  (PoA)was  introduced  earlier  as  a  quantification  of  this  offset,  as  a  utility  ratio  between  the  worst 
possible  Nash  solution  (among  multiple  Nash  equilibria)  and  the  social  optimum.  In  a  way,  this  index  serves 
to  quantify  the  loss  of  efficiency  due  to  competition.  It  has  been  shown  that  in  routing  games  and  resource 
allocation  games,  PoA  is  bounded  by  a  constant,  allowing  agents  to  achieve  some  level  of  efficiency  despite 
being  suboptimal. 

The  idea  of  quantifying  the  gap  between  social  optimality  and  game  equilibrium  solutions  sparked  many 
follow-up  work  in  that  same  vein.  Also  price  of  simplicity  has  been  introduced  for  a  pricing  game  in  com¬ 
munication  networks  as  the  ratio  between  the  revenue  collected  from  a  flat  pricing  rule  and  the  maximum 
possible  revenue.  Further,  price  of  uncertainty  has  been  introduced  to  measure  the  relative  payoff  of  an 
expert  user  of  a  security  game  under  complete  information  to  the  one  under  incomplete  information.  In 
another  earlier  work,  price  of  leadership  has  been  proposed  as  a  measure  of  comparison  of  utilities  in  a  power 
control  game  between  Nash  equilibria  and  Stackelberg  solutions.  In  all  of  these  works,  primarily  communi¬ 
cation  networks  have  been  used  as  a  backdrop  application  domain,  be  it  routing,  resource  allocation,  power 
control,  or  security.  Game-theoretical  methods  along  with  Nash  equilibrium  have  found  many  applications 
in  communication  networks. 

In  our  recent  work  [8,  38],  we  introduce  and  discuss  several  indices  which  quantify  variations  or  offsets  in 
the  payoff  values  or  costs  attained  under  Nash  equilibria  in  the  context  of  differential  games  (DGs).  We  first 
extend  the  notion  of  PoA  to  DGs,  which  heretofore  has  been  primarily  limited  to  static  continuous  kernel 
games.  We  provide  a  characterization  of  PoA  for  a  class  of  scalar  linear-quadratic  (LQ)  DGs,  and  quantify 
the  efficiency  loss  in  the  long  run  when  the  players  behave  non-cooperatively  under  the  Nash  equilibrium 
concept.  We  consider  both  open- loop  (OL)  and  closed-loop  (CL)  information  structures  (ISs).  We  show  that 
for  the  class  of  scalar  LQ  DGs  with  CL  IS  using  the  strongly  time-consistent  CL  feedback  Nash  equilibrium, 
the  PoA  has  some  appealing  computable  upper  bounds,  which  can  further  be  approximated  when  the  number 
of  players  is  sufficiently  large  (that  is,  the  large  population  regime),  whereas,  under  the  OL  IS,  it  is  possible 
to  obtain  an  expression  for  the  PoA  in  closed  form. 

As  mentioned,  going  from  static  to  dynamic  (differential)  games  brings  in  the  possibility  of  various  ISs 
which  add  richness  to  the  (Nash  equilibrium)  solution  of  a  game.  Different  ISs  (generally)  yield  different 
equilibrium  solutions,  and  hence  as  already  pointed  out,  IS  is  a  crucial  factor  in  the  investigation  of  PoA  in 
DGs.  Motivated  by  this,  we  introduce  another  index,  the  price  of  information  (Pol),  which  is  a  result  of  the 
comparison  of  the  equilibrium  utilities  or  costs  under  different  ISs.  For  the  class  of  scalar  LQ  DGs  above, 
we  have  shown  that  the  Pol  between  the  feedback  and  open- loop  ISs  is  bounded  from  below  by  \/2/2  and 
from  above  by  ^/2,  again  in  the  large  population  regime.  Finally,  motivated  by  some  recent  results  on  the 
level  of  cooperation  between  players  in  a  routing  game,  captured  by  the  degree  of  willingness  of  a  player  to 
place  partial  weight  on  other  players’  utilities  in  his  utility  function,  we  introduce  the  price  of  cooperation 
(PoC)  as  a  measure  of  benefit  or  loss  to  a  player  on  his  base  Nash  equilibrium  payoff  due  to  cooperation. 

Another  set  of  new  results  on  nonzero-sum  games,  but  with  decision  hierarchy,  has  been  reported  in 
papers  [10,  41],  which  introduce  the  notion  of  mixed  leadership  in  non-zero-sum  differential  games,  where 
there  is  no  fixed  hierarchy  in  decision  making  with  respect  to  the  players.  Whether  a  particular  player 
is  leader  or  follower  depends  on  the  instrument  variable  s/he  is  controlling,  and  it  is  possible  for  a  player 
to  be  both  leader  and  follower,  depending  on  the  control  variable.  We  have  studied  two-player  open-loop 
differential  games  in  this  framework,  and  obtained  a  complete  set  of  equations  (differential  and  algebraic) 
which  yield  the  controls  in  the  mixed-leadership  Stackelberg  solution.  The  underlying  differential  equations 
are  coupled  and  have  mixed  boundary  conditions.  Our  work  also  discusses  the  special  case  of  linear-quadratic 
differential  games,  in  which  case  solutions  to  the  coupled  differential  equations  can  be  expressed  in  terms  of 
solutions  to  coupled  Riccati  differential  equations  which  are  independent  of  the  state  trajectory. 


Finally,  in  the  recent  paper  [2],  we  formulate  an  evolutionary  multiple  access  control  game  with  contin¬ 
uous  variable  actions  and  coupled  constraints.  We  characterize  Nash  equilibria  of  the  game  and  show  that 
the  pure  equilibria  are  efficient  (Pareto  optimal)  and  also  resilient  to  deviations  by  coalitions  of  any  size,  i.e., 
they  are  strong  equilibria.  We  use  the  concepts  of  price  of  anarchy  and  strong  price  of  anarchy  to  study  the 
collective  performance  of  the  players  in  the  game.  We  also  address  the  question  of  how  to  select  one  specific 
equilibrium  solution  using  the  concepts  of  normalized  equilibrium  and  evolutionarily  stable  strategies.  We 
examine  the  long-run  behavior  of  these  strategies  under  several  classes  of  evolutionary  game  dynamics,  such 
as  Brown-von  Neumann- Nash  dynamics,  Smith  dynamics,  and  replicator  dynamics.  In  addition,  we  examine 
correlated  equilibrium  for  the  single-receiver  model.  Correlated  strategies  are  based  on  signaling  structures 
before  making  decisions  on  rates.  We  then  focus  on  evolutionary  games  for  hybrid  additive  white  Gaussian 
noise  multiple  access  channel  with  multiple  users  and  multiple  receivers,  where  each  user  chooses  a  rate  and 
splits  it  over  the  receivers.  Users  have  coupled  constraints  determined  by  the  capacity  regions.  Building 
upon  the  static  game  formulation  and  results,  we  formulate  a  system  of  hybrid  evolutionary  game  dynamics 
using  G-function  dynamics  and  Smith  dynamics  on  rate  control  and  channel  selection,  respectively.  We  show 
that  the  evolving  game  has  an  equilibrium  and  illustrate  these  dynamics  with  numerical  examples. 

2.5.  Learning  and  iterative  computation  under  minimal  exchange  of  information 

An  important  issue  in  multi-agent  systems  is  the  convergence  of  iterative  schemes  adopted  by  agents 
(players)  based  on  various  behavioral  patterns  and  using  minimal  information  on  the  actions  of  others,  to 
an  equilibrium  which  may  be  computable  offline  should  all  information  be  available  and  centralized  ( which 
it  is  not).  In  a  series  of  papers  [3,  4,  20,  26,  33,  45,  39],  we  have  addressed  this  problem  within  the  context 
of  static  Nash  games.  We  have  introduced  a  non-model  based  approach  for  locally  stable  convergence  to 
Nash  equilibria  in  static,  noncooperative  games  with  a  finite  number  of  (say,  N)  players.  In  classical  game 
theory  algorithms,  each  player  employs  the  knowledge  of  the  functional  form  of  his  payoff  and  the  knowledge 
of  the  other  players’  actions,  whereas  in  our  approach  the  players  need  to  measure  only  their  own  payoff 
values  (and  hence  online  information  on  other  players’  actions  is  provided  only  to  the  extend  they  affect 
an  individual  player’s  payoff).  The  response  strategies  of  the  players  in  our  work,  and  our  analysis,  are 
based  on  the  extremum  seeking  approach,  which  has  previously  been  developed  for  standard  optimization 
problems  and  employs  sinusoidal  perturbations  to  estimate  the  gradient.  We  first  consider  static  games  with 
quadratic  payoff  functions  before  generalizing  our  results  to  games  with  non-quadratic  payoff  functions  that 
are  the  output  of  a  dynamic  system.  Specifically,  we  consider  general  nonlinear  differential  equations  with 
N  inputs  and  N  outputs,  where  in  the  steady  state,  the  output  signals  represent  the  payoff  functions  of  a 
noncooperative  game  played  by  the  steady-state  values  of  the  input  signals.  We  employ  local  averaging  theory 
and  obtain  local  convergence  results  both  for  quadratic  payoffs,  where  the  actual  convergence  is  semi-global, 
and  for  non-quadratic  payoffs,  where  the  potential  existence  of  multiple  Nash  equilibria  precludes  serni- 
global  convergence.  Our  convergence  conditions  coincide  with  conditions  that  arise  in  model-based  Nash 
equilibrium  seeking.  However,  in  our  framework  the  user  is  not  meant  to  check  these  conditions  because  the 
payoff  functions  are  presumed  to  be  unknown.  For  non-quadratic  payoffs,  convergence  to  a  Nash  equilibrium 
is  not  perfect,  but  is  biased  in  proportion  to  the  perturbation  amplitudes  and  the  third  derivatives  of  the 
payoff  functions.  We  quantify  the  size  of  these  residual  biases  and  confirm  their  existence  numerically  in 
an  example  noncooperative  game.  In  this  example,  we  present  the  first  application  of  extremum  seeking 
with  projection  to  ensure  that  the  players  actions  remain  in  a  given  closed  and  bounded  action  set.  We  also 
consider  extensions  of  these  results  to  countably  and  uncountably  infinite  number  of  players. 

Another  line  of  research  on  iterative  schemes  for  games  and  a  new  set  of  algorithms  have  been  introduced 
in  [34],  where  the  framework  is  that  of  a  class  of  two-person  zero-sum  stochastic  games  with  an  arbitrary 
number  of  states  and  a  finite  number  of  actions  for  each  player,  with  possibly  probabilistic  (mixed)  strategies. 
When  each  player  has  a  complete  knowledge  of  its  payoff  function  and  has  past  access  to  past  actions  of 
the  others,  then  there  is  an  arsenal  of  tools  such  as  fictitious  play  algorithms,  best  response  dynamics,  and 
gradient-based  algorithms,  that  can  be  used  to  arrive  at  the  equilibrium  of  the  game.  However,  it  is  well 
known  that  these  algorithms  may  fail  to  converge  even  under  the  perfect  observation  of  actions  and  payoffs. 
A  new  learning  challenge  hence  arises  when  a  player  does  not  know  its  own  payoff  function  and/or  has  no 
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information  about  the  past  actions  of  the  other  players  (as  also  discussed  above).  In  this  case,  the  player 
needs  to  interact  with  the  environment  to  find  out  its  expected  payoff  and  its  optimal  strategy.  In  practical 
applications,  we  are  often  in  search  of  distributed  learning  algorithms  that  require  a  minimal  amount  of 
information  and  a  minimal  amount  of  resources.  It  is  then  natural  to  ask  whether  there  exists  a  learning 
scheme  that  demands  less  information  and  less  memory  within  a  dynamically  evolving  environment,  and  leads 
to  an  efficient,  stable  and  fair  outcome.  We  address  this  challenge  by  proposing  a  class  of  heterogeneous 
learning  algorithms  in  a  scenario  where  the  players  do  not  know  their  own  payoff  functions.  At  each  time 
t,  each  player  chooses  an  action  and  receives  a  numerical  value  for  its  payoff  or  perceived  payoff  as  an 
outcome  of  the  instantaneous  game.  In  contrast  to  fictitious  play  and  best  response  dynamics  which  require 
the  knowledge  of  the  history  of  actions  played  by  the  other  players,  our  learning  algorithm  relaxes  this 
assumption.  Indeed,  it  is  often  implausible  and  impractical  in  applications  to  assume  the  capability  of 
observations  of  the  actions  of  the  other  players.  Furthermore,  we  assume  that  the  state  space  of  the  game 
and  its  transition  law  between  the  states  are  unknown  to  the  players.  In  addition,  the  players  also  do  not 
have  the  knowledge  of  the  action  spaces  of  the  others.  The  question  we  have  addressed  is  how  much  the 
players  can  expect  to  learn  under  such  circumstances.  We  have  introduced  in  [34]  different  coupled  (or 
combined)  and  fully  distributed  learning  schemes  that  enable  learning  optimal  strategies  and  concurrently 
estimating  the  optimal  payoffs.  In  contrast  to  the  standard  reinforcement  learning  algorithms  which  focus 
only  on  either  strategy  or  payoff  reinforcement  for  the  equilibrium  learning,  the  algorithm  that  couples  the 
payoff-reinforcement  learning  together  with  strategy  reinforcement  learning  enables  an  immediate  prediction 
and  updates  the  strategies  by  updated  estimations  based  on  recent  experiences.  Our  learning  algorithms 
also  offer  the  degrees  of  freedom  to  model  different  levels  of  rationality  and  learning  rates  of  the  players. 
The  ordinary  differential  equations  (ODEs)  associated  with  the  stochastic  learning  algorithms  differ  from 
the  standard  replicator  dynamics,  best  response  dynamics  and  fictitious  play  dynamics.  We  also  establish 
particular  connections  to  logit  dynamics  and  imitative  logit  dynamics.  Using  stochastic  approximation 
techniques,  and  under  suitable  assumptions  on  the  learning  rates,  we  show  the  convergence  of  different 
learning  algorithms  to  a  new  class  of  game  dynamics  and  establish  their  asymptotic  properties  within  a  class 
of  zero-sum  stochastic  games. 

2.6.  Large  population  games,  and  equilibrium  analysis 

An  important  research  direction  in  multi-agent  systems  is  the  analysis  of  collective  behavior  that  arises 
when  the  population  is  large.  In  the  terminology  of  game  theory  this  entails  the  analysis  of  different  equilibria 
under  different  ISs  when  the  number  of  players  is  arbitrarily  large.  We  have  conducted  such  a  study  in 
[21,  31],  within  the  context  of  risk-sensitive  stochastic  differential  games.  Risk-sensitivity  is  captured  by 
exponentiating  the  integral  cost  (over  the  duration  of  the  game)  of  a  player,  before  taking  the  expectation, 
and  this  brings  in  additional  robustness  to  the  resulting  equilibrium  strategies  (or  optimal  control  laws, 
in  the  case  of  risk-sensitive  stochastic  control)-robustness  to  unmodeled  inputs  to  the  system  by  say  an 
adversary.  We  first  introduce  in  [21]  a  mean- field  stochastic  differential  game  model  where  the  players  are 
coupled  not  only  via  their  risk-sensitive  cost  functionals  but  also  via  their  states.  The  main  coupling  term  is 
the  mean-field  process,  also  called  occupancy  process  or  population  profile  process.  Then,  using  a  particular 
structure  of  state  dynamics,  we  derive  the  mean-field  limit  of  the  individual  state  dynamics,  leading  to  a 
nonlinear  controlled  macroscopic  McKean- Vlasov  equation.  Combining  together  with  the  convergence  of 
the  risk-sensitive  cost  functional,  we  provide  the  mean-held  optimality  principle,  and  obtain  compatibility 
with  the  density  distribution  using  the  Fokker- Planck- Kolmogorov  forward  equation.  The  mean-held  value 
of  the  exponentiated  cost  functional  coincides  with  the  value  function  of  a  Hamilton- Jacobi-Bellman  (HJB) 
equation  with  an  additional  quadratic  term.  We  provide  an  explicit  solution  of  the  mean-held  best  response 
when  the  instantaneous  cost  functions  are  log-quadratic  and  the  state  dynamics  are  affine  in  the  control. 
We  formulate  an  equivalent  mean-held  risk-neutral  problem  and  characterize  the  corresponding  mean-held 
equilibria  in  terms  of  backward-forward  macroscopic  McKean- Vlasov  equations,  Fokker-  Planck-Kolmogorov 
equations  and  HJB  equations. 

In  [27],  we  have  looked  at  the  consensus  problem  within  a  differential  game-theoretic  mean-held  frame¬ 
work.  In  networked  systems,  agents  typically  seek  to  achieve  a  task  with  some  knowledge  of  their  neighbors 


10 


or  immediate  friends.  Consensus  is  one  of  the  fundamental  and  pivotal  problems  in  decision  making  involving 
a  large  number  of  distributed  agents  reaching  consensus  in  their  opinions,  resources,  security,  and  the  like, 
in  [27],  we  work  in  the  framework  of  linear-quadratic  nonzero-sum  differential  games  defined  on  an  infinite 
horizon  with  discounted  cost,  which  we  study  under  different  information  structures  and  with  a  view  to 
consensus.  We  characterize  the  open-loop  (OL)  and  strongly  time-consistent  closed-loop  (STC  CL)  Nash 
equilibrium  (NE)  strategies  for  finite  population  and  large  population  regimes.  For  the  finite  population 
game,  the  STC  CL  NE  strategy  of  each  agent  is  affine  in  the  states  of  its  neighbors  and  consensus  is  achieved 
depending  on  the  initial  states  of  the  agents.  For  a  large  homogeneous  population,  the  STC  CL  NE  requires 
the  solution  of  a  nonlinear  PDE  that  describes  the  state  evolution  of  the  population,  which  is  coupled  with 
a  set  of  coupled  algebraic  Riccati  equations.  We  also  study  the  relationship  between  OL  and  STC  CL  as  the 
population  or  the  neighborhoods  grow. 

2.7.  Goal-oriented  coalition  formation  in  networks  of  agents 

In  one  piece  of  work  [6,  45]  we  have  looked  at  formation  of  coalitions  among  multiple  agents  (players) 
in  a  distributed  network,  with  the  process  driven  by  goals  of  individual  agents.  The  specific  model  (which 
has  application  involving  UAVs)  is  as  follows:  A  number  of  agents  are  required  to  collect  data  from  several 
arbitrarily  located  tasks.  In  a  wireless  network  framework,  each  task  represents  a  queue  of  packets  that 
require  collection  and  subsequent  wireless  transmission  by  the  agents  to  a  central  receiver.  The  problem 
is  modeled  as  a  hedonic  coalition  formation  game  between  the  agents  and  the  tasks  that  interact  in  order 
to  form  disjoint  coalitions.  Each  formed  coalition  is  modeled  as  a  polling  system  consisting  of  a  number  of 
agents,  designated  as  collectors,  which  move  between  the  different  tasks  present  in  the  coalition,  collect  and 
transmit  the  packets.  Within  each  coalition,  some  agents  might  also  take  the  role  of  a  relay  for  improving 
the  packet  success  rate  of  the  transmission.  The  hedonic  coalition  formation  algorithm  developed  allows 
the  tasks  and  the  agents  to  take  distributed  decisions  to  join  or  leave  a  coalition,  based  on  the  achieved 
benefit  in  terms  of  effective  throughput,  and  the  cost  in  terms  of  polling  system  delay.  As  a  result  of  these 
decisions,  the  agents  and  tasks  structure  themselves  into  independent  disjoint  coalitions  which  constitute 
a  Nash-stable  network  partition.  Moreover,  the  proposed  coalition  formation  algorithm  allows  the  agents 
and  tasks  to  adapt  the  topology  to  environmental  changes  such  as  the  arrival  of  new  tasks,  the  removal  of 
existing  tasks,  or  the  mobility  of  the  tasks.  Simulation  results  show  how  the  proposed  algorithm  allows  the 
agents  and  tasks  to  self-organize  into  independent  coalitions,  while  improving  the  performance,  in  terms  of 
average  player  (agent  or  task)  payoff,  of  at  least  30:26  %  (for  a  network  of  5  agents  with  up  to  25  tasks) 
relative  to  a  scheme  that  allocates  nearby  tasks  equally  among  agents. 


3.  Recap 

This  report  has  discussed  a  number  of  topics  and  issues  that  are  of  paramount  importance  to  networked 
systems,  and  has  described  the  broad  field  of  study  we  have  undertaken  in  our  research  and  the  results  we 
have  obtained  during  the  close  to  3-year  course  of  this  AFOSR  Grant.  Our  research  has  led  to  a  deeper 
understanding  of  the  various  trade-offs  that  exist  in  design  and  decision  making  in  networked  systems,  which 
involve  incompleteness  of  information,  decentralization,  communication  constraints,  resource  allocation,  dis¬ 
tributed  sensing  and  control,  and  coordination  and  consensus  formation.  During  the  course  of  this  research 
we  have  also  started  working  on  a  book  [46]  on  stochastic  networked  control  systems,  which  is  now  closer  to 
completion,  which  will  contain  an  in-depth  analysis  of  some  of  the  issues  discussed  above,  particularly  those 
that  involve  quantization,  uncertainty,  performance,  stabilization,  decentralization,  and  learning. 

4.  Related  Plenary  Talks  and  Special  Colloquia  given  by  the  PI 

Below  is  a  list  of  plenary  talks  and  special  colloquia  given  by  Tamer  Ba§ar  during  the  3-year  course  of 
the  Grant: 

•  Chinese  Automation  Congress  (CAC  2011),  Beijing,  China,  November  27,  2011  (Plenary)  (Title:  Multi- 

Agent  Networked  Systems:  Efficiency  Through  Coordination  and  Control) 
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•  Chinese  Control  Conference  (CCC  2011),  Yantai,  Shangdong  Province,  China,  July  22-24,  2011  (Ple¬ 
nary)  (Title:  Sensing,  Coordination  and  Control  in  Adversarial  Environments  with  Limited  Actions) 

•  5th  International  ICST  Conference  on  Performance  Evaluation  Methodologies  and  Tools  (ValueTools 
2011),  ENS,  Canches,  France,  May  17-19,  2011  (Plenary)  (Title:  Dynamic  Teams  and  Games  with 
Non-standard  Information) 

•  Turkish  National  Committee  on  Automatic  Control,  National  Syrnp.  on  Automatic  Control  (TOK’IO), 
Gebze  Institute  of  Technology,  Gebze,  Turkey,  September  21-23,  2010  (Plenary)  (Title:  Variations  on 
the  LQG  Paradigm  and  the  Emerging  Subtleties) 

•  Amorph  International  Workshop  -  Amorphous  Computing  Sz  Complex  Biological  Networks,  The  Halifax, 
Endcliffe  Village,  Sheffield,  UK,  August  17-20,  2010  (Plenary)  (Title:  Games,  Networks,  and  Distributed 
Computation) 

•  6th  Spain,  Italy  and  Netherlands  Meeting  on  Game  Theory  (SING  6),  Palermo,  Italy,  July  7-9,  2010 
(Plenary)  (Title:  Dynamic  Teams,  Games,  and  Non-Classical  Information) 

•  10th  International  Conference  on  Automation  Technology  (Automation  2009),  Tainan,  Taiwan,  June 
27-28,  2009  (Plenary)  (Title:  Networked  Sensing  and  Control  with  Limits  on  Transmission) 

•  2009  (21st)  Chinese  Control  and  Decision  Conference  (CCDC  2009),  Guilin  (Guangxi),  China,  June 
17-19,  2009  (Plenary)  (Title:  Games,  Decisions,  Control  and  Communications:  Common  Threads  and 
Coping  with  Non- Neutrality) 

•  2009  (5th)  Northeast  Control  Workshop  (NECW  2009),  Pittsburgh,  Pennsylvania,  April  24-26,  2009 
(Plenary)  (Title:  Non-Neutral  Decision  Making  in  Control  and  Dynamic  Games) 

•  Workshop  on  ’’Control  Systems  Security:  Challenges  and  Directions,”  CDC/ECC  2011,  Orlando,  FL, 
December  11,  2011  (Title:  Game  Theoretic  Approaches  to  Security) 

•  Workshop  on  ’’Game  Theory  for  Finance,  Social  and  Biological  Sciences  (GAM),”  the  University  of 
Warwick,  Coventry,  England,  April  14-17,  2010  (Title:  Non-Neutral  Decision  Making  in  Stochastic 
Teams  and  Games) 

•  ISR  Distinguished  Lecture,  University  of  Maryland,  College  Park,  Maryland,  October  11,  2010  (Title: 
Sensing,  Control,  and  Decision  Making  with  Limited  Actions) 

5.  Publications  and  Conference  Presentations  supported  by  the  Grant 

[1]  S.  Bhattacharya  and  T.  Ba§ar,  “Differential  game-theoretic  approach  to  a  spatial  jamming  problem,” 
Advances  in  Dynamic  Game  Theory  and  Applications,  Annals  of  Dynamic  Games,  vol.  11,  Birkhauser, 
2012  (to  appear) 

[2]  Q.  Zhu,  H.  Tembine,  and  T.  Ba§ar,  “Evolutionary  games  for  multiple  access  control,”  Advances  in 
Dynamic  Game  Theory  and  Applications,  Annals  of  Dynamic  Games,  vol.  11,  Birkhauser,  2012  (to 
appear) 

[3]  P.  Friliauf,  M.  Krstic,  and  T.  Ba§ar,  “Nash  equilibrium  seeking  for  dynamic  systems  with  non-quadratic 
payoffs,”  Advances  in  Dynamic  Game  Theory  and  Applications,  Annals  of  Dynamic  Games,  vol.  11, 
Birkhauser,  2012  (to  appear) 

[4]  P.  Frihauf,  M.  Krstic,  and  T.  Ba§ar,  “Nash  equilibrium  seeking  in  noncooperative  games,”  IEEE  Trans 
Automatic  Control,  57(5):  1192-1207,  May  2012. 

[5]  H.  Sun,  N.  Hovakimyan,  and  T.  Ba§ar,  “Li  adaptive  controller  for  uncertain  nonlinear  multi-input 
multi-output  systems  with  input  quantization.,”  IEEE  Trans  Automatic  Control,  57(3):565-578,  March 
2012. 

[6]  W.  Saad,  Z.  Han,  T.  Ba§ar,  M.  Debbah,  and  A.  Hjorungnes,  “Hedonic  coalition  formation  for  distributed 
task  allocation  among  wireless  agents,”  IEEE  Trans  Mobile  Computing,  10(9):1327-1344,  September 
2011. 
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[7]  S.  Yiiksel  and  T.  Ba§ar,  “Control  over  noisy  forward  and  reverse  channels,”  IEEE  Trans  Automatic 
Control,  56(5):  1014-1029,  May  2011. 

[8]  T.  Ba§ar  and  Q.  Zhu,  “Prices  of  anarchy,  information,  and  cooperation  in  differential  games,”  J  Dynamic 
Games  and  Applications,  l(l):50-73,  March  2011. 

[9]  S.  Bhattacharya  and  T.  Ba§ar,  “Spatial  approaches  to  broadband  jamming  in  heterogeneous  mobile 
networks:  A  game-theoretic  approach,”  J.  Autonomous  Robots  (Special  Issue  on  Search  and  Pursuit- 
evasion  with  Mobile  Robots),  31(4):367-381,  2011. 

[10]  T.  Ba§ar,  A.  Bensoussan,  and  S.P.  Sethi,  “Differential  games  with  mixed  leadership:  The  open-loop 
solution,”  J  Applied  Mathematics  and  Computation,  217:972-979,  2010. 

[11]  O.C.  Imer  and  T.  Ba§ar,  “Optimal  estimation  with  limited  measurements,”  IJSCC  (International  J. 
Systems,  Control  and  Communications)  Special  Issue  on  Information  Processing  and  Decision  Making 
in  Distributed  Control  Systems,  2(l/2/3):5-29,  2010. 

[12]  A.  Gupta,  P.  Grover,  C.  Langbort,  and  T.  Ba§ar,  “On  myopic  strategies  in  dynamic  adversarial  team 
decision  problems,  Proc.  46th  Annual  Conference  on  Information  Sciences  and  Systems  (CISS’12), 
Princeton,  NJ,  March  21-23,  2012. 

[13]  A.  Gupta,  C.  Langbort,  and  T.  Ba§ar,  “One-stage  control  over  an  adversarial  channel  with  finite 
codewords,”  Proc.  50th  IEEE  Conference  on  Decision  and  Control  and  European  Control  Conference 
(CDC/ECCT1),  Orlando,  Florida,  Dec  12-15,  2011,  pp.  4072-4078. 
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